Voicing classification of visual speech using convolutional neural networks

نویسندگان

Thomas Le Cornu

Ben P. Milner

چکیده

The application of neural network and convolutional neural network (CNN) architectures is explored for the tasks of voicing classification (classifying frames as being either non-speech, unvoiced, or voiced) and voice activity detection (VAD) of visual speech. Experiments are conducted for both speaker dependent and speaker independent scenarios. A Gaussian mixture model (GMM) baseline system is developed using standard image-based two-dimensional discrete cosine transform (2D-DCT) visual speech features, achieving speaker dependent accuracies of 79 % and 94 %, for voicing classification and VAD respectively. Additionally, a singlelayer neural network system trained using the same visual features achieves accuracies of 86 % and 97 %. A novel technique using convolutional neural networks for visual speech feature extraction and classification is presented. The voicing classification and VAD results using the system are further improved to 88 % and 98 % respectively. The speaker independent results show the neural network system to outperform both the GMM and CNN systems, achieving accuracies of 63 % for voicing classification, and 79 % for voice activity detection.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hand Gesture Recognition from RGB-D Data using 2D and 3D Convolutional Neural Networks: a comparative study

Despite considerable enhances in recognizing hand gestures from still images, there are still many challenges in the classification of hand gestures in videos. The latter comes with more challenges, including higher computational complexity and arduous task of representing temporal features. Hand movement dynamics, represented by temporal features, have to be extracted by analyzing the total fr...

متن کامل

Provide a Deep Convolutional Neural Network Optimized with Morphological Filters to Map Trees in Urban Environments Using Aerial Imagery

Today, we cannot ignore the role of trees in the quality of human life, so that the earth is inconceivable for humans without the presence of trees. In addition to their natural role, urban trees are also very important in terms of visual beauty. Aerial imagery using unmanned platforms with very high spatial resolution is available today. Convolutional neural networks based deep learning method...

متن کامل

A New Method to Improve Automated Classification of Heart Sound Signals: Filter Bank Learning in Convolutional Neural Networks

Introduction: Recent studies have acknowledged the potential of convolutional neural networks (CNNs) in distinguishing healthy and morbid samples by using heart sound analyses. Unfortunately the performance of CNNs is highly dependent on the filtering procedure which is applied to signal in their convolutional layer. The present study aimed to address this problem by a...

متن کامل

شبکه عصبی پیچشی با پنجره‌های قابل تطبیق برای بازشناسی گفتار

Although, speech recognition systems are widely used and their accuracies are continuously increased, there is a considerable performance gap between their accuracies and human recognition ability. This is partially due to high speaker variations in speech signal. Deep neural networks are among the best tools for acoustic modeling. Recently, using hybrid deep neural network and hidden Markov mo...

متن کامل

Decision Support System for Age-Related Macular Degeneration Using Convolutional Neural Networks

Introduction: Age-related macular degeneration (AMD) is one of the major causes of visual loss among the elderly. It causes degeneration of cells in the macula. Early diagnosis can be helpful in preventing blindness. Drusen are the initial symptoms of AMD. Since drusen have a wide variety, locating them in screening images is difficult and time-consuming. An automated digital fundus photography...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

Voicing classification of visual speech using convolutional neural networks

نویسندگان

چکیده

منابع مشابه

Hand Gesture Recognition from RGB-D Data using 2D and 3D Convolutional Neural Networks: a comparative study

Provide a Deep Convolutional Neural Network Optimized with Morphological Filters to Map Trees in Urban Environments Using Aerial Imagery

A New Method to Improve Automated Classification of Heart Sound Signals: Filter Bank Learning in Convolutional Neural Networks

شبکه عصبی پیچشی با پنجره‌های قابل تطبیق برای بازشناسی گفتار

Decision Support System for Age-Related Macular Degeneration Using Convolutional Neural Networks

عنوان ژورنال:

اشتراک گذاری